Quantitative description and modeling of real networks. 
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In this letter we present data analysis and modeling of two particular cases of study in the field of 
growing networks. We analyze WWW data set and authorship collaboration networks in order to 
check the presence of correlation in the data. The results are reproduced with a pretty good agree- 
ment through a suitable modification of the standard AB model of network growth. In particular, 
intrinsic relevance of sites plays a role in determining the future degree of the vertex. 



The fractal properties of social networks have been 
largely investigated by the statistical mechanics commu- 
nity in recent times. Many quantities have been recog- 
nized as "signatures" of complexity in such networks. In 
particular, the probability distribution of the degree of 
the nodes in a social network displays an algebraic decay 
in several different realizations, including the Internet, 
the WWW, the movie actors network and the science 
collaboration network 0-^]. Since then, many models 
have been developed in order to reproduce this particu- 
lar feature of real networks j|D ■ 

Recent studies have provided a more detailed picture of 
the connectivity in social networks. In the Internet Au- 
tonomous Systems (IAS) network, the relation between 
the degree of a node k and the average degree of its neigh- 
bors k nn {k) has been measured, showing a decaying be- 
havior of k nn (k) for large k; such property is connected 
to a hierarchical structure of the growth process . 

Using a slightly different formalism, it has been shown 
that a taxonomy of social networks can be made accord- 
ing to the correlation between the degrees of directly 
connected nodes || . In networks displaying "assortative 
(disassortative) mixing" , the correlation is positive (neg- 
ative), which corresponds to an increasing (a decreasing) 
behavior of k nn (k). 

Moreover, a growing number of researches deals with 
the clustering properties of social networks, that is, the 
presence and the abundance of groups of nodes having 
a strong internal connectivity. To study the cluster- 
ing properties, we lack a unique physical quantity: di- 
rected and undirected graphs, indeed, require different 
approaches. In the undirected case, the clustering coeffi- 
cient, i.e. the average fraction of neighbors of a node that 
are also directly connected one to each other, is usually 
measured. 

Recent surveys on IAS have measured the clus- 
tering coefficient Cfe around nodes of degree k. These 
empirical studies show a decaying behavior of Cfe with 
respect to k, as in the case of k nn (k). 

The same quantities can be measured in directed 
graphs, though the generalization to this case may be 
somewhat arbitrary. In principle, one could consider the 
in-going or the outgoing links in finding the neighbors of 



a node to measure their degree and their mutual connec- 
tivity. This way, the number of directed links within a 
group of nodes may be greater than the number of pairs 
of node, thus leading to clustering coefficient greater than 
one. 

Another and simpler way to generalize such a method 
to directed graphs is to take one-way links as bidirectional 
ones, and to consider the resulting undirected graphs, 
where the traditional definitions apply. We neglect that 
some pair of nodes may be actually mutually linked, and 
we replace this two directed links with a single undi- 
rected one. In the WWW database we used ||, for ex- 
ample, about one fifth of the links are reciprocal. We 
have adopted this technique to measure both k nn (k) and 
Cfe, finding qualitatively the same results as in the IAS 
(undirected) case studied in HQ . Both quantities behave 
as a power law for large fc, with decay exponents close to 
the ones measured in H[?J. 

Standard noise reducing data analysis techniques show 
that k nn (k) ~ fc~ - 76 for large k, as shown in Figure [l], 
and Cfe ~ fc~ 103 , see Figure g, 

This behavior is in qualitative agreement with the 
power laws found in the IAS case Q], though the ex- 
ponents are slightly different; By their measurements, 
which are affected by a weaker noise, k nn (k) ~ fc~ 5 and 
Cfe ~ fc~ - 75 . 

This phenomenon, however, is not ubiquitous. Indeed, 
it is observed in networks where the decision of connect- 
ing a pair of nodes only depends on one of the connect- 
ing node. We claim that the distinction between "as- 
sortative" and "disassortative" mixing, as introduced in 
H , relies on this particular property of the microscopic 
growth mechanism. 

In the WWW and in the IAS networks, the link growth 
mechanism is strictly local, lacking any outer supervision. 
In this case, each node is free to choose highly relevant 
neighbors. 

On the other hand, networks with assortative mixing 
are often examples of networks where a single node has 
no power to choose its neighbors. E.g., in the actors col- 
laboration networks film directors decide the link struc- 
ture and the nodes, the actors, have no power to direct 
their connectivity. Due to economical constraints, ex- 
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pensive celebrities are often balanced by less relevant ac- 
tor, biasing the connectivity correlation. A qualitative 
difference is found in k nn {k) and in Cfc for this particu- 
lar network. As shown in Figure |l], k nn {k) grows with 
k, in contrast with the decay observed in the IAS and 
in the WWW. This confirms what has been empirically 
found in || where the WWW appears among the net- 
work with disassortative mixing, whereas the actor col- 
laboration network has assortative mixing. Cfc, however, 
displays a decreasing behavior, though it does not seem 
to follow a power law as in the the IAS and WWW cases. 

Our claim is reinforced by other cases of assortative 
mixing, studied in |TJ] , like the scientific collaboration net- 
works. Indeed, to establish a scientific collaboration the 
agreement of both scientists is needed, and a single node 
of such networks is not free of choosing its neighbors. 

To check if our hypothesis is true, we introduce a grow- 
ing undirected network model. Sites are added at a dis- 
crete pace, and each site has an intrinsic "relevance" , 
which is a random variable drawn from a uniform distri- 
bution in the range [0, 1]. 

In our interpretation, a link is a relevance attribution 
to the pointed node, in the spirit of |[o]-[l^]. In the 
WWW, for example, a relevant web page rarely points to 
a non relevant one, suggesting a relevance-driven connec- 
tivity concentration. To implement such a policy, in our 
model a node added at time t with a relevance r t can be 
connected only to nodes having a relevance higher than 
r tl with linear preferential attachment: the probability of 
acquiring a new link is proportional to the actual degree. 

This implies that an existing node i with a rele- 
vance ri and degree ki has a probability pi = Q(r s — 
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of acquiring a new link, where <d(x) 



1 for x > and <d(x) = otherwise. Finally, we assume 
that a newly added node is connected to m existing nodes 
according to the described rule. 

Let us call fcj(t) the degree at time t of the node i 
introduced at time i, whose "relevance" is r%. At each 
time step, there is a probability that the newly intro- 
duced node has a relevance r t < ri, since is drawn 
from a uniform distribution between and 1. Then, the 
probability of increasing by 1 the degree ki (r^ , t) is ap- 
proximately given by 



(pi 



r t ki(t) 



(52s':r B >r t ^s(t))r t 



(1) 



where (.) rt denotes the average over all the realizations 
of r t . In the following, we will neglect the explicit time 
dependence whenever unnecessary. We can write a rate 
equation for the degree, following the reasoning made in 
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To evaluate the denominator in the r.h.s. of the equation 
above, we have to compute 
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M*))n = / dr t A(r t ,t), 



s:r s >rt 

where we defined 



A(r t ,t)= J2 fc *W- 



(3) 
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A(r t , t) is a decreasing function of rt- If r t — 0, A(r t ,t) 
is the sum of the degree of all the nodes at time t, i.e. 
A(0, t) = 2mt; on the other hand, if rt = 1 the sum 
in the definition of A(r t ,t) does not contain any term; 
therefore, A(l,t) = 0. Thus, we assume 



A(r t ,t) = 2mt{l-r" (t) ) 



(5) 



as the general functional form of A(r t , t). By this ansatz, 
we can compute the r.h.s. of eq.(|^), 



r o(t) 

dr t A(r t ,t) = 2mtn(l - - ' ] 
o 1 + £*(*) 



Let us assume that 



a = lim a(t). 

t — >oo 



In this case, we can define 



C(r) = 1 



1 + a 



(G) 



(7) 



(8) 



Therefore, for large t the rate equation takes the form 



ki — 



Ti ki 
2tC(r 



which admits the solution 



ki{t) = m 



(9) 



for the time evolution of the degree, following the same 
reasoning as in |j| . 

Let us now call K(r, t)dr the the sum of the degrees of 
the nodes with relevance between r and r + dr, at time 
t. At each time step, dr nodes on average are introduced 
with such a relevance. Eq. (^|) gives us the degree ac- 
quired by each of these nodes. To obtain K(r, t) we have 
to sum over all time steps from 1 to t, and we get 



drK(r,t) = dr^fc s (t). 



If s is continuous, the sum becomes an integral 
drK{r,t) — mdr / ds I - 



(10) 
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which implies 



K(r,t) 



mt 
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(12) 



We can estimate a by integrating K(r,t) over all r, 
thus obtaining the total sum of the nodes' degree: 



/ drK(r,t) = 2mt. 
Jo 



(13) 



Therefore, using the expression of K(r,t) we can write 
the following equation for a, 
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dr 



1 + a 



(2 - r)(l +a) - 2r a 



1. 
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This equation can be solved numerically, and yields a = 
1.3837. The hypothesis made in equation (||) is verified 
in simulations of the model, as shown in Figure [3] 

Following |l3[ , we compute the statistical distribution 
of the degree P(k) by its time evolution. The probability 
P(ki(t) > k) that a randomly chosen node i has a degree 
higher than k at time t is equal to the probability that 
the node has been introduced in the network at a time 
i < t(k/m) r i , as one may verify by solving the time 
evolution of ki with respect to i. Since nodes are added 
at a uniform pace, we have 



P(ki(t) >k) = {k/m)~ 



(15) 



By definition, the probability distribution of the degree 
of nodes with relevance r is P(k,ri) = — j^P{ki{t) > k). 

The total degree distribution, regardless the relevance 
of the node, is obtained by averaging P{k 1 r i ) with re- 
spect to the uniform distribution of rf 
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P(k) = - / dn— P(ki(t) > k) 
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After replacing the kernel of this integral by its explicit 
expression obtained above, we get 



P(k) 



1 



dr 



-B(r) 



fl(r), 
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where fl( r ) = 2(1 - £-). 

We can estimate the power-law exponent of the degree 
distribution P(k) finding upper and lower bounds for its 
integral expression. Indeed, we find that 
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is such that the integrand is monotonically growing. 
Therefore it is easily seen that 



pf k ) < L -21n(fc/2)B(l)B(l) 

k 



3a + l 
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As for the lower bound, we first observe that the inte- 
grand is monotonically increasing, with positive second 
derivative. So, 



F 1 (r)=F(l)-F'(l)(l-r) 



(20) 



is such that Fi(r) < F(r) for Q < r < 1. If we then 
extend the integral from r\ {F\(r\) = 0) to 1, we surely 
find an underestimate for P(k). In particular we find 



, , 3e« + l 

P(k) > fc"^rr 
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T ln(fc/2) - 1 



(21) 



The asymptotic behavior of P(k) is therefore k~'^+ T with 
at most logarithmic corrections. 

We numerically checked that P{k) is a power law with 
a rather weak correction that slows down the decay, as 
displayed in Figure ||. Neglecting the correction, the 
best approximating exponent of the PDF is about —2.16, 
which confirms the above computation. Indeed, we have 
^±1 = 2.16. 

This value, moreover, is close to the exponents one 
measures in real networks, which lie in the range 2 — 2.4. 

In the simulation of the model, k nn {k) and Ck have also 
been numerically investigated. Unfortunately, we could 
not find an analytical description of these two quanti- 
ties. As required by real data, k nn (k) and Ck decay al- 
gebraically with respect to k. For the nearest neighbors 
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degree, we approximately measured k nn (k) ~ k 
shown in Figure [l]. The value of the exponent agrees 
with the measurement reported in [^J^], which yields 
k nn {k) ~ jfe-"* with v k = 0.5 ± 0.1. 

As of the clustering coefficient Cfc , simulations reported 
in Figure^ show that Cfc ~ k~ - 72 . The same relation, 
measured by (?]||, in the IAS networks case, reads ~ 
hr u with lo = 0.75 ±0.03. 

The qualitative behavior of these quantities is repro- 
duced in our extremely simple model. As a comparison, 
let us recall that, without an intrinsic relevance, a sim- 
ple growing network model with preferential attachment 
shows no correlation between the degrees of two linked 
nodes. In addition, in this models the clustering coeffi- 
cient around a node does not depend on the the degree 
of the node @||. 

An improvement in approximating real data could be 
achieved by adding other microscopic interactions to the 
dynamics of our toy model, such as rewiring and elimina- 
tion and links, or by merging nodes, as already done in 
former wor ks - fL6| in the search for a better approxi- 
mation of the scale free degree distribution. 

We believe that our analysis has pointed out some key 
structural features of social networks, by the observation 
of the correlation and the clustering of the connectivity 
in networks. In particular, the non trivial behavior of 
the nearest-neighbor average degree and of the connec- 
tivity coefficient have been measured in some real exam- 
ples. We also provided a toy model is a growing networks 
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with preferential attachment, where nodes only connect 
to more relevant ones. We have shown numerically and 
analytically as far as we could, that our model repro- 
duces qualitatively the statistical properties of real net- 
works, including the correlations in the connectivity We 
believe that this approach suggests new empirical mea- 
surements to be carried out on real networks, as well as 
needs new analytical steps further in the comprehension 
of this complex systems. 
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FIG. 1. Average degree k nn (k) of nearest neighbors of a 
node with degree k, as a function of k. Triangles refer to the 
actor collaboration network, plus symbols refer to the WWW 
empirical survey (10-points averaged), circles to simulations 
of our model. 
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FIG. 2. Clustering coefficient around a node of degree 
k as a function of k. Circles refer to the actor collabora- 
tion network, plus symbols refer to the WWW empirical sur- 
vey (10-points averaged) and squares to the simulation of our 
model. 



10 1 



\ A(x,t)/2mt 


,* 

, •' 




X 







FIG. 3. Degree PDF in our network model made of 10 4 
nodes, with m = 2. Plus symbols refer to numerical simula- 
tion. The solid line is obtained by plotting eq. [j^. The dashed 
line is proportional to fc -2 16 . Inset: The function A 2 m't^ plot- 
ted for t = 10 4 and m = 2. The dashed line represents a; 1,38 , 
displayed here to check the validity of our ansatz. 
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